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Abstract 

Many applications like pointer analysis and incremental compi- 
lation require maintaining a topological ordering of the nodes 
of a directed acyclic graph (DAG) under dynamic updates. All 
known algorithms for this problem are either only analyzed for 
worst-case insertion sequences or only evaluated experimentally 
on random DAGs. We present the first average-case analysis of 
online topological ordering algorithms. We prove an expected 
runtime of 0(n 2 polylog(n)) under insertion of the edges of a 
complete DAG in a random order for the algorithms of Alpern et 
al. (SODA, 1990), Katriel and Bodlaender (TALG, 2006), and 
Pearce and Kelly (JEA, 2006). This is much less than the best 
known worst-case bound 0(n 2 ) for this problem. 

1 Introduction 

There has been a growing interest in dynamic graph algorithms over the 
last two decades due to their applications in a variety of contexts includ- 
ing operating systems, information systems, network management, assembly 
planning, VLSI design and graphical applications. Typical dynamic graph 
algorithms maintain a certain property (e.g., connectivity information) of 
a graph that changes (a new edge inserted or an existing edge deleted) dy- 
namically over time. An algorithm or a problem is called fully dynamic if 
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both edge insertions and deletions are allowed, and it is called partially dy- 
namic if only one (either only insertion or only deletion) is allowed. If only 
insertions are allowed, the partially dynamic algorithm is called incremen- 
tal; if only deletions are allowed, it is called decremental. While a number 
of fully dynamic algorithms have been obtained for various properties on 
undirected graphs (see [10] and references therein), the design and analy- 
sis of fully dynamic algorithms for directed graphs has turned out to be 
much harder (e.g., [13, 24-26]). Much of the research on directed graphs 
is therefore concentrated on the design of partially dynamic algorithms in- 
stead (e.g., [3, 7, 14]). In this paper, we focus on the analysis of algorithms 
for maintaining a topological ordering of directed graphs in an incremental 
setting. 

A topological order T of a directed graph G = (V, E) (with n :— \V\ and 
m := \E\) is a linear ordering of its nodes such that for all directed paths 
from x E V to y G V (x ^ y), it holds that T(x) < T(y). A directed graph 
has a topological ordering if and only if it is acyclic. There are well-known 
algorithms for computing the topological ordering of a directed acyclic graph 
(DAG) in 0(m + n) time in an offline setting (see e. g. [8]). In a fully dynamic 
setting, each time an edge is added or deleted from the DAG, we are required 
to update the bijective mapping T. In the online/ incremental variant of this 
problem, the edges of the DAG are not known in advance but are inserted 
one at a time (no deletions allowed). As the topological order remains valid 
when removing edges, most algorithms for online topological ordering can 
also handle the fully dynamic setting. However, there are no good bounds 
known for the fully dynamic case. Most algorithms are only analyzed in the 
online setting. 

Given an arbitrary sequence of edges, the online cycle detection problem 
is to discover the first edge which introduces a cycle. Till now, the best 
known algorithm for this problem involves maintaining an online topological 
order and returning the edge after which no valid topological order exists. 
Hence, results for online topological ordering also translate into results for 
the online cycle detection problem. Online topological ordering is required 
for incremental evaluation of computational circuits [2] and in incremental 
compilation [16, 18] where a dependency graph between modules is main- 
tained to reduce the amount of recompilation performed when an update 
occurs. An application for online cycle detection is pointer analysis [21]. 

For inserting m edges, the nai've way of computing an online topologi- 
cal order each time from scratch with the offline algorithm takes 0(m 2 + 



1 INTRODUCTION 



3 



ran) time. Marchetti-Spaccamela, Nanni, and Rohnert [17] gave an algo- 
rithm that can insert m edges in 0(mn) time. Alpern, Hoover, Rosen, 
Sweeney, and Zadeck (AHRSZ) proposed an algorithm [2] which runs in 
0{\)K{\ log(|)i^(|)) time per edge insertion with \)K(\ being a local mea- 
sure of the insertion complexity. However, there is no analysis of AHRSZ 
for a sequence of edge insertions. Katriel and Bodlaender (KB) [14] an- 
alyzed a variant of the AHRSZ algorithm and obtained an upper bound 
of (9(min{m2 logn, ma + n 2 logn}) for inserting an arbitrary sequence of m 
edges. The algorithm by Pearce and Kelly (PK) [19] empirically outperforms 
the other algorithms for random edge insertions leading to sparse random 
DAGs, although its worst-case runtime is inferior to KB. Ajwani, Friedrich, 
and Meyer (AFM) [1] proposed a new algorithm with runtime 0(n 2 ' 75 ), which 
asymptotically outperforms KB on dense DAGs. 

As noted above, the empirical performance on random edge insertion 
sequences (REIS) for the above algorithms are quite different from their 
worst-cases. While PK performs empirically better for REIS, KB and AFM 
are the best known algorithms for worst-case sequences. This leads us to the 
theoretical study of online topological ordering algorithms on REIS. A nice 
property of such an average-case analysis is that (in contrast to worst-case 
bounds) the average of experimental results on REIS converge towards the 
real average after sufficiently many iterations. This can give a good indication 
of the tightness of the proven theoretical bounds. 

Our contributions are as follows: 

• We show an expected runtime of 0(n 2 log 2 n) for inserting all edges of 
a complete DAG in a random order with PK (cf. Section 4). 

• For AHRSZ and KB, we show an expected runtime of 0(n 2 log 3 n) 
for complete random edge insertion sequences (cf. Section 5). This is 
significantly better than the known worst-case bound of 0(n 3 ) for KB 
to insert Q(n 2 ) edges. 

• Additionally, we show that for such edge insertion sequences, the ex- 
pected number of edges which force any algorithm to change the topo- 
logical order ("invalidating edges") is Oin^ ^logn) (cf. Section 6), 
which is the first such result. 

The remainder of this paper is organized as follows. The next section 
describes briefly the three algorithms AHRSZ, KB, and PK. In Section 3 we 
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specify the random graph models used in our analysis. Sections 4-6 prove 
our upper bounds for the runtime of the three algorithms and the number of 
invalidating edges. Section 7 presents an empirical study, which provides a 
deeper insight on the average case behavior of AHRSZ and PK. 

2 Algorithms 

This section first introduces some notations and then describes the three 
algorithms AHRSZ, KB, and PK. We keep the current topological order as 
a bijective function T: V — > [l..n]. In this and the subsequent sections, we 
will use the following notations: d(u, v) denotes \T(u) — T(v)\, u < v is a 
short form of T(u) < T(v), u — > v denotes an edge from u to v, and u ~» v 
expresses that v is reachable from u. Note that u u, but not u — > u. The 
degree of a node is the sum of its in- and out-degree. 

Consider the i-th edge insertion u — > v. We say that an edge insertion is 
invalidating if u > v before the insertion of this edge. We define r!§ := {x e 
V | v < xAx ~* u}, R { f := {y eV \ y < uAv ~* y} and 5^ = RfuR { £. Let 
\8^\ denote the number of nodes in 5® and let \\8^\\ denote the number of 
edges incident to nodes of 5®. Note that 5® as defined above is different from 
the adaptive parameter 5 of the bounded incremental computation model. If 
an edge is non-invalidating, then \R$\ = \R$\ = \8^\ = 0. Note that for an 
invalidating edge, Rp) D R^§ = as otherwise the algorithms will just report 
a cycle and terminate. 

We now describe the insertion of the i-th edge u — > v for all the three 
algorithms. Assume for the remainder of this section that u — > v is an 
invalidating edge, as otherwise none of the algorithms do anything for that 
edge. We define an algorithm to be local if it only changes the ordering 
of nodes x with v < x < u to compute the new topological order T' of 
G U {{u, v )}. All three algorithms are local and they work in two phases - a 
"discovery phase" and a "relabelling phase" . 

In the discovery phase of PK, the set 5^ is identified using a forward 
depth-first search from v (giving a set Rp?) and a backward depth- first search 
from u (giving a set R^)- The relabelling phase is also very simple. It sorts 
both sets R^p and R^ separately in increasing topological order and then 
allocates new priorities according to the relative position in the sequence R$ 
followed by Rp\ It does not alter the priority of any node not in thereby 
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greatly simplifying the relabeling phase. The runtime of PK for a single edge 
insertion is 6(||<5 W || + \5 {i) \ log \6®\). 

Alpern et al. [2] used the bounded incremental computation model [24] 
and introduced the measure \)K{\. For an invalidated topological order T, 
the set K C V is a cover if for all x, y G V : (x y A y < x =>- x G 
K\J y G .fT). This states that for any connected a; and y which are incorrectly 
ordered, a cover K must include x or ?/ or both. and ||i^|| denote the 
number of nodes and edges touching nodes in K, respectively. We define 
\)K(\ := \K\ + \\K\\ and a cover K to be minimal if \)K{\ < \)K(\ for any 
other cover K. Thus, captures the minimal amount of work required 

to calculate the new topological order V of G U {(u,v)} assuming that the 
algorithm is local and that the adjacent edges must be traversed. 

AHRSZs discovery phase marks the nodes of a cover K by marking 
some of the unmarked nodes x, y G 5® with x ~* y and y < x. This is done 
recursively by moving two frontiers starting from v and u towards each other. 
Here, the crucial decision is which frontier to move next. AHRSZ tries to 
minimize \\K\\ by balancing the number of edges seen on both sides of the 
frontier. The recursion stops when forward and backward frontier meet. Note 
that we do not necessarily visit all nodes in R^) (R$) while extending the 
forward frontier (backward frontier). It can be proven [2] that the marked 
nodes indeed form a cover K and that \)K{\ < 3 \)K{\. 

The relabeling phase employs the dynamic priority space data structure 
due to Dietz and Sleator [9]. This permits new priorities to be created be- 
tween existing ones in 0(1) amortized time. This is done in two passes over 
the nodes in K. During the first pass, it visits the nodes of K in reverse 
topological order and computes a strict upper bound on the new priorities 
to be assigned to each node. In the second phase, it visits the nodes in K in 
topological order and computes a strict lower bound on the new priorities. 
Both together allow to assign new priorities to each node in K. Thereafter 
they minimize the number of different labels used to speed up the opera- 
tions on the priority space data structure in practice. It can be proven that 
the discovery phase with \)K(\ priority queue operations dominates the time 
complexity, giving an overall bound of 0(\)K( \ log |)i^(|). 

KB is a slight modification of AHRSZ. In the discovery phase AHRSZ 
counts the total number of edges incident on a node. KB counts instead 
only the in-degree of the backward frontier nodes and only the out-degree 
of the forward frontier nodes. In addition, KB also simplified the relabeling 
phase. The nodes visited during the extension of the forward (backward) 
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frontier are deleted from the dynamic priority space data-structure and are 
reinserted, in the same relative order among themselves, after (before) all 
nodes in R^ (R^p) not visited during the backward (forward) frontier exten- 
sion. The algorithm thus computes a cover K C $M anc l its complexity per 
edge insertion is 0(\)K(\ log \)K{\). The worst case running time of KB for 
a sequence of m edge insertions is (9(min{m2 logn, ma + n 2 logn}). 



3 Random Graph Model 

Erdos and Renyi [11, 12] introduced and popularized random graphs. They 
defined two closely related models: G(n,p) and G(n, M). The G(n,p) model 
(0 < p < 1) consists of a graph with n nodes in which each edge is chosen 
independently with probability p. On the other hand, the G(n, M) model 
assigns equal probability to all graphs with n nodes and exactly M edges. 
Each such graph occurs with a probability of l/ (^), where N :— (™). 

For our study of online topological ordering algorithms, we use the ran- 
dom DAG model of Barak and Erdos [4]. They obtain a random DAG by 
directing the edges of an undirected random graph from lower to higher 
indexed vertices. Depending on the underlying random graph model, this 
defines the DAG(n,p) and DAG(n,M) model. We will mainly work on the 
DAG(n, M) model since it is better suited to describe incremental addition 
of edges. 

The set of all DAGs with n nodes is denoted by DAG n . For a random 
variable / with probability space DAG n , E M [/] and E p [/] denotes the ex- 
pected value in the DAG(n, M) and DAG(n,p) model, respectively. For the 
remainder of this paper, we set E [/] := Em [/] and q := 1 — p. 

The following theorem shows that in most investigations the models 
DAG{n,p) and DAG(n, M) are practically interchangeable, provided M is 
close to pN. 

Theorem 1. Given a function f : DAG n — > [0, a] with a > and f(G) < 
f(H) for all G C H and functions p and M of n with < p < 1 and MgN. 

vN — M 

1. If lim pqN = lim v = oo, then E M [f] < E p [f] + o(l). 

n^oo n^oo \/pql\ 

M -vN 

2. If lim pqN = lim _L = oo, then E p [f] < E M [f] + o(l). 

n^oo n^oo \/pqT\ 
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The analogous theorem for the undirected graph models G(n,p) and 
G(n, M) is well known. A closer look at the proof for it given by Bollobas 
[6] reveals that the probabilistic argument used to show the close connection 
between G(n,p) and G(n, M) can be applied in the same manner for the two 
random DAG models DAG(n,p) and DAG(n, M). 

We define a random edge sequence to be a uniform random permutation 
of the edges of a complete DAG, i. e., all permutations of (™) edges are equally 
likely. If the edges appear to the online algorithm in the order in which they 
appear in the random edge sequence, we call it a random edge insertion 
sequence (REIS). Note that a DAG obtained after inserting M edges of a 
REIS will have the same probability distribution as DAG(n, M). To simplify 
the proofs, we first show our results in DAG(n,p) model and then transfer 
them in the DAG(n, M) model by Theorem 1. 



4 Analysis of PK 

When inserting the i-th edge u — > v, PK only regards nodes in 5® := {x £ 
V | v < x < u A (v -w x V x ~» u)} with "<" defined according to the 
current topological order. As discussed in Section 2, PK performs + 
\S^ \ log \S^\) operations for inserting the i-th edge. The intuition behind the 
proofs in this section is that in the early phase of edge-insertions (the first 
O(nlogn) edges), the graph is sparse and so only a few edges are traversed 
during the DFS traversals. As the graph grows, fewer and fewer nodes are 
visited in DFS traversals (\S^\ is small) and so the total number of edges 
traversed in DFS traversals (bounded above by ||5 < -*- ) ||) is still small. 

Theorems 4 and 10 of this section show for a random edge insertion 

sequence (REIS) of N edges that £ii \5 (i) \ = 0(n 2 ) and E \j2? =1 

C(n 2 log 2 n). This proves the following theorem. 

Theorem 2. For a random edge insertion sequence (REIS) leading to a 
complete DAG, the expected runtime of PK is 0(n 2 log 2 n). 

A comparable pair (of nodes) are two distinct nodes x and y such that 
either x y or y -w x. We define a potential function $j similar to Katriel 
and Bodlaender [14]. Let be the number of comparable pairs after the 
insertion of % edges. Clearly, 

A$i := $i - > for all 1 < % < M, 

$o = 0, and $ M < n{n - l)/2. ^ ' 
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Theorem 3. For all edge sequences, (i) |<5 W | < A$j + 1 and (ii) |5 W | < 

Proof. Consider the i-th edge (u,v). If u < v, the theorem is trivial since 
\S^\ = 0. Otherwise, each vertex of Ftp and fS^ (as defined in Sec- 
tion 2) gets newly ordered with respect to u and v, respectively. The set 
[J xeR (.i) (x, v) fl U xeR (i) (u, x ) = {{ u -> v )}- This means that overall at least 

\Rf\ + \R ( b ] \ ~ 1 node pairs get newly ordered: 

A^>\Rf\ + \R^\-l = \5^\-l. 
Also, since in this case A$i > 1, \5 (i) \ < 2A$i. □ 

N 

Theorem 4. For all edge sequences, ^ \5^\ < n(n — 1) = 0(n 2 ). 

i=l 

N N 

Proof. By Theorem 3 (i), we get \5 {i) \ < J^(A$; + 1) = <5> N + N < 

i=l i=l 

n(n-l)/2 + n(n-l)/2 = n(n-l). □ 

The remainder of this section provides the necessary tools step by step 
to finally prove the desired bound on J2f=i ll^ll m Theorem 10. One can 
also interpret as a random variable in DAG(n, M) with M = i. The 
corresponding function ^ for DAG(n,p) is defined as the total number of 
comparable node pairs in DAG(n,p). Pittel and Tungol [22] showed the 
following theorem. 

Theorem 5. For p := c log(n)/ra and c> 1, E p [*] = (1 + o(l)) \ (l - ±) . 

Using Theorem 1, this result can be transformed to $ as defined above 
for DAG(n, M) and gives the following bounds for E« 

Theorem 6. For n log n < k < N — 2n log n 7 

E M [ t J = (l + o(l))^(l-!i-iM 2 . 

2 \ 2(/c + nlogn) / 

For N -2nlogn < fc < iV-21ogn 7 

Em [* fc ] = (1 + o(l))^ f 1 (r 7 1)l0gn 1 2 . 

2(A; +v /logn(iV-A;)); 
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Proof. The function * : DAG n -> [0, iV] and < (if) wherever G C H. 
The later inequality is true as the nodes already ordered in G will still remain 
ordered in if. For nlogn < k < N — 2nlogn, consider p : = fc+7 ^° gra . Then 

logn logn „ T , (n — l)log 2 n 
lim pqN > lim — —N > lim — - — = oo 

n— >oo n-+oo ft n n^oo 2n 

and 

pN — k pN — k nlogn 
lim — -=■ > lim > lim — -=^- 

n->oo y/pqN n~*oo yj ]\[ n->oo yj j\f 

nlogn 

> lim > lim logn = oo. 

n^oo n n^oo 

Since all the conditions of Theorem 1 are satisfied for these values of k and 
p, E M [*] = 0(E P [*]). In particular, 

E M [* fc ] = E p = (fc+nlogn)/JV [*] + o(l) = (1 + (1))£ (l - |^^y) ' ■ 
For TV — 2nlogn < k < N — 2 logn, we choose p := fc+ v /l °g"( j L_^ # Clearly, 



> N - 2n\ogn + y/\og n(N - (N - 2 logn)) iV - 2n logn + logn 
P ~ N - N ' 

Using this, we get 



(N - In logn + V2 logn) (N-k- \f\ogn (N — k)) N 
lim pqN > lim 



N N 



Observe that f(k) := N — k — \/log n (N — k) has its minimum at ko = 
N — log(n)/4 since f'(k ) = and f"(ko) = 2/logn > 0. Hence, we conclude 
that f(k) is monotonically decreasing in our interval (N— 2nlogn, N— 2 logn) 
and attains its minimum at iV — 2 logn. Therefore, N — k — A/log n (JV — /c) > 
2 logn — v^2 logn — > oo, which in turn proves lim^oo pqN = oo and 

pN-k ^logn(N-k) r 

lim — -j= > lim — > lim a/ logn = oo 

n-oo y/p-q-R ™ ^ _ " _ = _ = n->oc 



4 ANALYSIS OF PK 



10 



Together with Theorem 5, this yields 



'p=(k+y/\ogn(N-k))/N 



m+o(i) 



) 



2 



(1 + 0(1))— 11- 



(n — 1) logn 



□ 



2(k + v/logn^-fc)) 



The degree sequence of a random graph is a well-studied problem. The 
following theorem is shown in [6]. 

Theorem 7. If pn/ logn — > oo ; £/ien almost every graph G in the G(n,p) 
model satisfies A(G) = (1 + o{l))pn, where A(G) is the maximum degree of 
a node in G. 

As noted in Section 3, the undirected graph obtained by ignoring the 
directions of DAG(n,p) is a G(n,p) graph. Therefore, the above result 
is also true for the maximum degree (in-degree + out-degree) of a node 
in DAG(n,p). Using Theorem 1, the above result can be transformed to 
DAG(n, M), as well. 

Theorem 8. With probability 1 — O(^), there is no node with degree higher 
than 21^ for sufficiently large n and M > nlogn in DAG(n, M). 

Proof. We examine the following two functions: 

• f\(g) : Number of nodes with degree at least g{n) 



For fi, f 2 in G(n,p), g(n) := pn + 2y / pqn logra, and some constant c, 
Bollobas [5] showed 



Consider any random DAG(n, M). It must have been obtained by taking 
a random graph G(n, M) and ordering the edges. The degree of a node in 
DAG(n, M) is the same as the degree of the corresponding node in G(n, M). 

We break down the analysis depending on M. At first, consider the 



• h(g) := ft{g) 



Ep[/i(ff)l 



E p [/ 2 (^)]-Ej[/ 1 (^]<c.E p [/ 1 (^]. 



(2) 




The degree of any node in an 
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undirected graph cannot be higher than n — 1. However, as M > iV— 3nlogn, 
21 • — > y(n — 1) — 63 log n. For sufficiently large n this is greater than 
n—1 and therefore, no node can have degree higher than it. 

Next, we consider M e (knlogn, (k + 1) nlogn] for 1 < k < I, where 
I —p— 1—2, and we prove the theorem for each interval. We choose 

L n log n J L 

p k := (k + 2)^^p, q k : = 1 - p fc , and g k (n) := p k n + 2y/p k q k nlogn and look 
for the conditions in Theorem 1. Note that < p k < 1, /i : G n — > [0,n], 
/ 2 : G n -> [0,n 2 ], and < /;(#) wherever G C # for i = 1, 2. The later 

inequality holds as the degree of any node in H is greater than or equal to 
the corresponding degree in G. For 1 < k < I, 

3n log n 6 log n 
Pk > — > 



N 



n 



1 



and 
5* > 1 - 



N 



- 1 



nlogn 
So for each interval, 



n logn 
N 



> 1 - 



N — nlogn 
n logn 



n log n 2 log n 



N 



n 



v at ^ l- 6 logn 2 logn 2 
lim p k q k JM > lim A/ > lim olog n = oo 



n-+oo n — 1 n—1 
and by M fe < (fc + 1) nlogn and k + 2< 



lim 

n^oo 



M , pN 
> lim 

y/pqN n^OO y/pN 



M 



nlogn 



> lim - 

n-^oo y/(k + 2) nlogn 



= lim 

n^oo 



V n log n 
VkT2 



. nlogn 
> lim — > lim logn 



OO 



In each interval, all the conditions of Theorem 1 are satisfied and therefore, 
E M [fi(gk)} = E Pfc [fi{g k )\ + o(l) for % = 1, 2 and 1 < A; < Z. Using Equa- 
tion (2), we get E M [/i(^)] = 0(E Pk [fi(g k )}) = O (J) and 

of, (/!((/*)) = E M ftGfc)] - E^ [A(^)] = 0{E Pk [f 2 {g k )\ - E 2 pk [fi{g k )] ) 
= OiaKfM)) = 0(E Pk fofa)]) = O (I) . 

Therefore, by substituting X := /i(^ fc ), /i := E M [/i (#*.)] = O(J), o- 2 : = 
a 2 M {fi(g k )) = O (^), and z/ := 1 — /x in Chebyshev's inequality (Pr{|X — /i| > 
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v} < £), we get 

MI/ 1 fe)-H>i-M}<o(^ p )=oa). 

However, Pr{|/i( 9t ) - /i| > 1 - /i} = Pr{(/ife) > 1) or (Mg k ) < 
2/i — 1)} and since, jj, = O (-) and fi(gk) is non-negative random variable, 
P r {/i(<?fc) < 2yU — 1} = for sufficiently large n. Therefore, Pi{fi(g k ) > 
1} = Pr{\fi(g k ) — > 1 — /i} — O (i). In other words, with probability 
(1 — O(^)), there is no node with a degree higher than g k in any interval. 
However, by p k > we get 

i Th lo§ Th 

9k{n) = Pkn + 2 a/ p k q k n\ogn < 3p k n < 6(k + 2) — — — 

For sufficiently large n, < |, and this implies 

, , „„ v, 7(A; + 2)M 21M 
^ n < 7 (k + 2 logn < ^— < . 

k n n 

Therefore, with probability 1 — C(^), there is no node with a degree higher 
than 21^ in G(n, M) and by the argument above, in DAG(n, M). □ 

As the maximum degree of a node in DAG(n, i) is 0(i/n), we finally just 
need to show a bound on ^\ (i ■ \5®\) to prove Theorem 10. This is done in 
the following theorem. 

Theorem 9. For DAG(n, M) and r := N - 2 logn, 



E 



i=l 



0(n 3 log 2 n). 



Proof. Let us decompose the analysis in three steps. First, we show a bound 
on the first nlogn edges. By definition of \5®\ < n. Therefore, 

n log n n log n 

i-E[\5^\] < 2 i-n = 0(n 3 \og 2 n). (3) 

i=i i=i 
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The second step is to bound ^2l =n i ogn i ' \^\ w ^ n t :— N — 2nlogn. For 
this, Theorem 3 (ii) shows for all k such that nlogn < k < t that 



E 



i=k 



< 2E 



5>*. 



i=k 



2 E [$ t - $ fc _i] = 2 E [$ t ] - 2E [$ fe _x] 



(4) 



The function hidden in the o(l) in Theorem 5 is decreasing in p [22]. Hence, 
also the o(l) in Theorem 6 is decreasing in /c. Plugging this in Equation (4) 
yields (with s := nlogn) 



E 



E i*' 



i=k 



<(l + o(l))n 2 1- 



(n — 1) logn 
2(t + s) 

= (l + o(l))n 2 (n-l)logn(— 



- 1- 



(n — 1) lognx 2 
2(fc- 1 + s) 
2 

+ 



2(fc-l + s) 2(t + s) 
(n — 1) logn / 1 1 \ \ 

4 V(t + s) 2 ~ (jfc- 1 + s) 2 )) 



< (1 + o(l))n 2 (n - 1) logn 

< (1 + o(l))n 2 (n - 1) logn 



k-l+s t + s 
1 



k-1 



(5) 



By linearity of expectation and Equation (5), 



E 



=s+l 



log(rfi) 



= yj (,e[i^i])< y, (*« E E H* ( ' 

i=s+l j = l 

log(rfi) * 

< E ( 2,s E E 0* (1 

j=l 1=20-!) s+1 

log(rfi) 

< £ ^ s (l + o(l))n 2 (n-l)logn^- Ty -) 



log(rii) 

E (2(1 + o(l))n 2 (n- 1) logn) 

3=1 

2(1 + o(l)) n 2 (n - 1) log 2 n = C(n 3 log 2 n). 
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For the last step consider a k such that t < k < r. Theorem 3 (ii) gives 



E 



i=k 



< 2E 



i=k 



2E[$ r -$ fe _ 1 ] = 2E[$ r ]-2E [$*_!]. 



Using Theorem 6 and similar arguments as before, this yields (with s(fc) : = 
A/log ra (N-k)) 



E 



E i**' 



«=fc 

< (1 + o(D) n 2 (7l - (n ~ 1)lQgn V -(l (n-l)\o g n y\ 

+ ^ 2(r + s(r))J V 2(k-l + s(k-l))J J 



(1 + o(l))n 2 (n - 1) logn 



2(k - 1 + s(ife - 1)) 2(r + s(r)) 
n — 1) logn / 1 1 



+ 



V + s(r)) 2 (jfe - 1 + s(k - 

Since k+s(k) is monotonically increasing for t < k < r, ^ k+ ^ k ^2 is a monoton- 

ically decreasing function in this interval. Therefore, ^ + ^ r ^ 2 ~ (fc-i+^fc-i)^ < 
0, which proves the following equation. 



E 



i=k 



< (1 + o(l))n 2 (n - 1) logn 

< (1 + o(l))n 2 (n - 1) logn 



1 



1 



fc-l + s(A;-l) r + s(r) 
1 



k-1 



(6) 



By linearity of expectation and Equation (6), 



E 



=7V-2n log n+1 



E i' E 

i=N—1n log n+1 



<(iV-21ogn) £ E[|5 W |] 

i=iV-2nlogn+l 
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< (N — 2 logn) (1 + o(l)) n 2 (n - 1) logn 



1 



= 0(n 3 logn). 
Theorem 10. For DAG(n,M), E 



A? 



£n* (< 



i=i 



AT — 2nlogn — 1 



C»(n 2 log 2 n). 



□ 



Proof. By definition of ||<5^||, we know \\S^\\ < z and hence 

n log n 

E = 0(n 2 log 2 n). 

i=i 

Again, let r := AT — 2 logn. Theorem 8 tells us that with probability 
greater than (l — for some constant c ', there is no node with degree > ^ 
(for c = 21). Since the degree of an arbitrary node in a DAG is bounded by 
n, we get with Theorems 4 and 9, 



E 



E w 5(l) \ 

i=n log n+1 



E 



E 



c z 



i law I 



o(-E 



,i=n log n+1 

,i=i 



n 



+ E 



E 



n c 



,i=nlogn+l 



n 



+ n 



o(^- (n 3 log 2 n) + n 2 ) = C(n 2 log 2 n). 



By again using the fact that the degree of an arbitrary node in a DAG is at 
most n, we obtain 



E 



N 



E n*' 



Ol 



i=r+l 



O n-E 



iV 



E i* 

i=r+l 



0| 



JV 



0[n- E n ) = C(n 2 logn). 

i=r+l 



Thus, 



E 



E n* (0 



i=l 



E 



n log n 

E n* (,| i 

. j=i 

,2 i 2 



E 



E 



_i=n log n+1 
,2i„„2„\ , //V„2- 



E 



' TV 

E 

i=r+l 



= 0(n 2 log 2 n) + 0(n 2 log 2 n) + 0(n 2 logn) = 0(n 2 log 2 n).U 
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5 Analysis of AHRSZ and KB 

Katriel and Bodlaender [14] introduced KB as a variant of AHRSZ for which 
a worst-case runtime of O (mm{m ? log n,m 2 +n 2 logn}) can be shown. In 
this section, we prove an expected runtime of 0(n 2 log 3 n) under random 
edge insertion sequences, both for AHRSZ and KB. 

Recall from Section 2 that for every edge insertion there is a minimal 
cover . The following theorem shows that 5^ is also a valid cover in this 
situation. 

Theorem 11. 5^ is a valid cover. 

Proof. Consider the insertion of the i-th edge (u, v) and consider a node-pair 
x, y such that x y, but x > y. Since before the insertion of this edge, 
the topological ordering was consistent, x-^u^v-^y,x<u and v < y. 
Together with x > y, it implies x > v. Now x u and x > v imply x £ 
Thus, for every node-pair (x, y) such that x -w y and x > y, x e 5^ and 
hence, 5® is a valid cover. □ 

Therefore, by definition of \)K^(\, |>if w (| < = + \\6®\\. 



E 



< 



i=i 



Wl 



E 



En 5 " 



C(n 2 log 2 n) 



The latter equality follows from Theorems 4 and 10. The expected complexity 



of AHRSZ on REIS is thus C(E YZi logn 

KB also computes a cover K C ^(') and its comp 
is 0(\)K{\ log|)X(|). Therefore, \)K(\ < \5^\ + ||, 



) = C(n 2 log 3 n). 

exity per edge insertion 
)^|| and with a similar 



argument as above, the expected complexity of KB on REIS is 0(n log n). 



6 Bounding the number of invalidating edges 

An interesting question in all this analysis is how many edges will actually 
invalidate the topological ordering and force any algorithm to do something 
about them. Here, we show a non-trivial upper bound on the expected 
value of the number of invalidating edges on REIS. Consider the following 
random variable: inval(z) = 1 if the i-th edge inserted is an invalidating 
edge; inval(z) = otherwise. 
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Theorem 12. E 



^ INVAL(i) 



0(ram{m, n 2 log 2 n}). 



Proof. If the i-th edge is invalidating, \5^\ > 2; otherwise iNVAL(i) = \S^\ 
0. In either case, INVAL(i) < \S^\/2. Thus, for s := log 2 n and t 
min{m, N — 2n logn}, 



E 



E 

i=s+l 



INVALU 



< E 



^ 2 



< 



J=s+1 



<(1 + Q(l))" 2( "" 1)l0gn 



• n 2 log 2 n. 



The second inequality follows by substituting k := s + 1 in Equation (5). 
Also, since the number of invalidating edges can be at most equal to the 
total number of edges, Yli=i inval(z) < s. 



E 



^ INVAL(i) 



E 



^ INVAL(i) 
,i=l 

3 



E 



E 



^ INVAL(i 

i=s+i 

< 0(s) +0(n^ log^ ra) + 0(n log n) = £>(?tJ log^ n) 



^ INVAL(i) 



The second bound E l INVAL(«)] < m is obvious by definition of 
INVAL(i). □ 



7 Empirical observations 

In addition to the achieved average-case bounds, we also examined AHRSZ 
and PK experimentally using the implementation of David J. Pearce [19] 
available from www.mcs.vuw.ac.nz/~djp/dts.html. For varying number of 
vertices n = 100, 200, . . . , 10000, we generated random edge insertion se- 
quences (REIS) leading to complete DAGs and averaged the performance 
parameter C(n) over 250 runs. The chosen C{n) upper bounds the respec- 
tive runtimes. 

The performance parameter taken for AHRSZ is C(n) : = 
\)K{\ \og(\)K{\). We know E [C{n)\ = £>(n 2 log 3 n) from Section 5 
and know that the overall runtime is f2(n 2 ) since the algorithm has to 
inspect all the edges being inserted. In our experimental setting, we dis- 
covered that C (n) I (n 2 log 2 n) is apparently a decreasing function and that 
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Figure 1. Experimental results of AHRSZ for the insertion of the edges 
of a complete DAG in a random order. The horizontal axes describe the 
number of vertices n. The vertical axes show the measured empirical insertion 
costs C(n) := ^1)^(1 log j ) J^T ( | relative to (a) n 2 logn and (b) n 2 log 2 n, 
respectively. The error bars specify the sample standard deviation. 



C{n)/{n 2 log n) is an increasing function. This empirical evidence suggests 
that C(n) is possibly between Vt(n 2 \ogn) and 0(n 2 \og 2 n). Figure 1 shows 
our experimental results for AHRSZ. 

We consider C(n) : = + l<* (0 l lo S l^ W l) 

as a performance param- 
eter for PK and observe that C(n)/n 2 is decreasing while C(n)/(n 2 log" 1 n) 
is increasing. This indicates that C(n) = o(n 2 ), which implies an actual 
runtime of 6(n 2 ) for PK on REIS since all fi(n 2 ) edges have to be inspected. 
Pearce and Kelly [19] showed empirically that PK outperforms AHRSZ on 
sparse DAGs. Our experiments extend this to dense DAGs. 

Complementing Section 6, we also examined empirically the number of in- 
validating edges for AHRSZ. The same experimental set-up as above suggests 
a quasilinear growth of Y1T=\ iNVAL(i) between fi(nlogn) and 0(n\og 2 n). 
Note that the observed empirical bound for AHRSZ is significantly lower 
than the general bound 0{n^ log2 n) of Theorem 12 which holds for all al- 
gorithms. 
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8 Discussion 

On random edge insertion sequences (REIS) leading to a complete DAG, we 
have shown an expected runtime of 0(n 2 log 2 n) for PK and C(n 2 log 3 n) 
for AHRSZ and KB while the trivial lower bound is VL(n 2 ). Extending 
the average case analysis for the case where we only insert m edges with 
m <C n 2 still remains open. On the other hand, the only non-trivial 
lower bound for this problem is by Ramalingam and Reps [23], who have 
shown that an adversary can force any algorithm which maintains ex- 
plicit labels to require Q(nlogn) time complexity for inserting n — 1 edges. 
There is still a large gap between the lower bound of f2(max{n log n,m}), 
the best average-case bound of 0(n 2 log 2 n) and the worst-case bound of 
(9(min{m L5 + n 2 logn, m L5 logn, n 2 ' 75 }). Bridging this gap remains an open 
problem. 
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